Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding

نویسندگان

چکیده

Visual context provides grounding information for multimodal machine translation (MMT). However, previous MMT models and probing studies on visual features suggest that is less explored in as it often redundant to textual information. In this paper, we propose an Object-level Context modeling framework (OVC) efficiently capture explore translation. With detected objects, the proposed OVC encourages ground desirable objects by masking irrelevant modality. We equip with additional object-masking loss achieve goal. The estimated according similarity between masked source texts so encourage source-irrelevant objects. Additionally, order generate vision-consistent target words, further a vision-weighted OVC. Experiments datasets demonstrate model outperforms state-of-the-art analyses show helps MMT.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SHEF-Multimodal: Grounding Machine Translation on Images

This paper describes the University of Sheffield’s submission for the WMT16 Multimodal Machine Translation shared task, where we participated in Task 1 to develop German-to-English and Englishto-German statistical machine translation (SMT) systems in the domain of image descriptions. Our proposed systems are standard phrase-based SMT systems based on the Moses decoder, trained only on the provi...

متن کامل

Sheffield MultiMT: Using Object Posterior Predictions for Multimodal Machine Translation

This paper describes the University of Sheffield’s submission to the WMT17 Multimodal Machine Translation shared task. We participated in Task 1 to develop an MT system to translate an image description from English to German and French, given its corresponding image. Our proposed systems are based on the state-of-the-art Neural Machine Translation approach. We investigate the effect of replaci...

متن کامل

An Efficient Character-Level Neural Machine Translation

Neural machine translation aims at building a single large neural network that can be trained to maximize translation performance. The encoder-decoder architecture with an attention mechanism achieves a translation performance comparable to the existing state-of-the-art phrase-based systems on the task of English-to-French translation. However, the use of large vocabulary becomes the bottleneck...

متن کامل

Object-Level Context Modeling For Scene Classification with Context-CNN

Convolutional Neural Networks (CNNs) have been used extensively for computer vision tasks and produce rich feature representation for objects or parts of an image. But reasoning about scenes requires integration between the low-level feature representations and the high-level semantic information. We propose a deep network architecture which models the semantic context of scenes by capturing ob...

متن کامل

Exploiting Document-Level Context for Data-Driven Machine Translation

This paper presents a method for exploiting document-level similarity between the documents in the training corpus for a corpusdriven (statistical or example-based) machine translation system and the input documents it must translate. The method is simple to implement, efficient (increases the translation time of an example-based system by only a few percent), and robust (still works even when ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i4.16376